Skip to content

Conversation

@AlexRapatij
Copy link

@AlexRapatij AlexRapatij commented Jan 27, 2026

Description (*)

For a fairly large database, the analytics_collect_data cron job takes too long to execute. For one real example project, it took last time ~36 hours.
According to the investigation, the reason for that is the validation of the report definition query. By adding a limit of 0, it turns into a final SQL-query without any limits at all. As a result, there is a loading of all tables data.
The solution is super simple - to add a limit equal to 1 to the validation query

Manual testing scenarios (*)

  1. Add a temporary custom logging before calling query (line 55). Unfortunately, the bin/magento dev:query-log:enable doesn't log such queries.
// Example of logging
try {
    \Magento\Framework\App\ObjectManager::getInstance()->get(\Psr\Log\LoggerInterface::class)
        ->info('Report validation query: ' . $query->getSelect()->__toString());
    $connection->query($query->getSelect());
} catch (\Zend_Db_Statement_Exception $e) {
    return [$name, $e->getMessage()];
}
  1. Execute the analytics_collect_data cron job
php n98-magerun2.phar sys:cron:run analytics_collect_data
  1. Check the logs
grep 'Report validation query' var/log/system.log

Examples

Before change

There is queries, before the change was applied

SELECT `catalog_category_entity_text`.`value` AS `content` FROM `catalog_category_entity_text`
SELECT `catalog_product_entity_text`.`value` AS `content`, `eav_attribute`.`attribute_code` FROM `catalog_product_entity_text`
SELECT `cms_page`.`content` FROM `cms_page`
SELECT `cms_block`.`content` FROM `cms_block`
SELECT `setup_module`.`module` AS `module_name`, `setup_module`.`schema_version`, `setup_module`.`data_version` FROM `setup_module` 
SELECT `store`.`store_id`, `store`.`code`, `store`.`group_id`, `store`.`name`, `store`.`is_active` FROM `store` 
SELECT `store_website`.`website_id`, `store_website`.`code`, `store_website`.`name`, `store_website`.`default_group_id`, `store_website`.`is_default` FROM `store_website` 
SELECT `store_group`.`group_id`, `store_group`.`website_id`, `store_group`.`name`, `store_group`.`default_store_id` FROM `store_group` 
SELECT `catalog_product_entity`.`entity_id`, `catalog_product_entity`.`sku` FROM `catalog_product_entity` WHERE (catalog_product_entity.created_in <= '1748965680') AND (catalog_product_entity.updated_in > '1748965680') 
SELECT `magento_banner_content`.`banner_content` AS `content` FROM `magento_banner_content`
SELECT `quote`.`entity_id`, `quote`.`customer_id`, `quote`.`store_id`, `quote`.`created_at`, `quote`.`converted_at`, `quote`.`is_active`, `quote`.`items_count`, `quote`.`items_qty`, `quote`.`orig_order_id` FROM `quote` 
SELECT `review`.`review_id`, `review`.`created_at`, `review`.`entity_pk_value` FROM `review` 
SELECT `rating_option_vote_aggregated`.`primary_id`, `rating_option_vote_aggregated`.`entity_pk_value`, `rating_option_vote_aggregated`.`store_id`, `rating_option_vote_aggregated`.`rating_id`, `rating_option_vote_aggregated`.`percent_approved` FROM `rating_option_vote_aggregated` 
SELECT `sales_order`.`entity_id`, `sales_order`.`created_at`, `sales_order`.`customer_id`, `sales_order`.`status`, `sales_order`.`base_grand_total`, `sales_order`.`base_tax_amount`, `sales_order`.`base_shipping_amount`, SHA1(`sales_order`.`coupon_code`) AS `coupon_code`, `sales_order`.`store_id`, `sales_order`.`store_name`, `sales_order`.`base_discount_amount`, `sales_order`.`base_subtotal`, `sales_order`.`base_total_refunded`, `sales_order`.`shipping_method`, `sales_order`.`shipping_address_id`, SHA1(`sales_order`.`customer_email`) AS `customer_email`, `sales_order`.`base_total_online_refunded`, `sales_order`.`base_total_offline_refunded`, `sales_order`.`base_currency_code`, `sales_order`.`billing_address_id` FROM `sales_order` 
SELECT `sales_order_item`.`item_id`, `sales_order_item`.`created_at`, `sales_order_item`.`name`, `sales_order_item`.`base_price`, `sales_order_item`.`qty_ordered`, `sales_order_item`.`order_id`, `sales_order_item`.`sku`, `sales_order_item`.`product_id`, `sales_order_item`.`store_id` FROM `sales_order_item` 
SELECT `sales_order_address`.`entity_id`, `sales_order_address`.`customer_id`, `sales_order_address`.`city`, `sales_order_address`.`region`, `sales_order_address`.`country_id` FROM `sales_order_address` 
SELECT `customer_entity`.`entity_id`, `customer_entity`.`created_at`, SHA1(`customer_entity`.`email`) AS `email`, `customer_entity`.`store_id` FROM `customer_entity` 
SELECT `wishlist`.`wishlist_id`, `wishlist`.`customer_id` FROM `wishlist` 
SELECT `wishlist_item`.`wishlist_item_id`, `wishlist_item`.`added_at`, `wishlist_item`.`qty`, `wishlist_item`.`store_id`, `wishlist_item`.`wishlist_id`, `wishlist_item`.`product_id` FROM `wishlist_item` 

After

After the limitation was added

SELECT `catalog_category_entity_text`.`value` AS `content` FROM `catalog_category_entity_text`
SELECT `catalog_product_entity_text`.`value` AS `content`, `eav_attribute`.`attribute_code` FROM `catalog_product_entity_text`
SELECT `cms_page`.`content` FROM `cms_page`
SELECT `cms_block`.`content` FROM `cms_block`
SELECT `setup_module`.`module` AS `module_name`, `setup_module`.`schema_version`, `setup_module`.`data_version` FROM `setup_module` LIMIT 1 
SELECT `store`.`store_id`, `store`.`code`, `store`.`group_id`, `store`.`name`, `store`.`is_active` FROM `store` LIMIT 1 
SELECT `store_website`.`website_id`, `store_website`.`code`, `store_website`.`name`, `store_website`.`default_group_id`, `store_website`.`is_default` FROM `store_website` LIMIT 1 
SELECT `store_group`.`group_id`, `store_group`.`website_id`, `store_group`.`name`, `store_group`.`default_store_id` FROM `store_group` LIMIT 1 
SELECT `catalog_product_entity`.`entity_id`, `catalog_product_entity`.`sku` FROM `catalog_product_entity` WHERE (catalog_product_entity.created_in <= '1748965680') AND (catalog_product_entity.updated_in > '1748965680') LIMIT 1 
SELECT `magento_banner_content`.`banner_content` AS `content` FROM `magento_banner_content`
SELECT `quote`.`entity_id`, `quote`.`customer_id`, `quote`.`store_id`, `quote`.`created_at`, `quote`.`converted_at`, `quote`.`is_active`, `quote`.`items_count`, `quote`.`items_qty`, `quote`.`orig_order_id` FROM `quote` LIMIT 1 
SELECT `review`.`review_id`, `review`.`created_at`, `review`.`entity_pk_value` FROM `review` LIMIT 1 
SELECT `rating_option_vote_aggregated`.`primary_id`, `rating_option_vote_aggregated`.`entity_pk_value`, `rating_option_vote_aggregated`.`store_id`, `rating_option_vote_aggregated`.`rating_id`, `rating_option_vote_aggregated`.`percent_approved` FROM `rating_option_vote_aggregated` LIMIT 1 
SELECT `sales_order`.`entity_id`, `sales_order`.`created_at`, `sales_order`.`customer_id`, `sales_order`.`status`, `sales_order`.`base_grand_total`, `sales_order`.`base_tax_amount`, `sales_order`.`base_shipping_amount`, SHA1(`sales_order`.`coupon_code`) AS `coupon_code`, `sales_order`.`store_id`, `sales_order`.`store_name`, `sales_order`.`base_discount_amount`, `sales_order`.`base_subtotal`, `sales_order`.`base_total_refunded`, `sales_order`.`shipping_method`, `sales_order`.`shipping_address_id`, SHA1(`sales_order`.`customer_email`) AS `customer_email`, `sales_order`.`base_total_online_refunded`, `sales_order`.`base_total_offline_refunded`, `sales_order`.`base_currency_code`, `sales_order`.`billing_address_id` FROM `sales_order` LIMIT 1 
SELECT `sales_order_item`.`item_id`, `sales_order_item`.`created_at`, `sales_order_item`.`name`, `sales_order_item`.`base_price`, `sales_order_item`.`qty_ordered`, `sales_order_item`.`order_id`, `sales_order_item`.`sku`, `sales_order_item`.`product_id`, `sales_order_item`.`store_id` FROM `sales_order_item` LIMIT 1 
SELECT `sales_order_address`.`entity_id`, `sales_order_address`.`customer_id`, `sales_order_address`.`city`, `sales_order_address`.`region`, `sales_order_address`.`country_id` FROM `sales_order_address` LIMIT 1 
SELECT `customer_entity`.`entity_id`, `customer_entity`.`created_at`, SHA1(`customer_entity`.`email`) AS `email`, `customer_entity`.`store_id` FROM `customer_entity` LIMIT 1 
SELECT `wishlist`.`wishlist_id`, `wishlist`.`customer_id` FROM `wishlist` LIMIT 1 
SELECT `wishlist_item`.`wishlist_item_id`, `wishlist_item`.`added_at`, `wishlist_item`.`qty`, `wishlist_item`.`store_id`, `wishlist_item`.`wishlist_id`, `wishlist_item`.`product_id` FROM `wishlist_item` LIMIT 1 

Contribution checklist (*)

  • Pull request has a meaningful description of its purpose
  • All commits are accompanied by meaningful commit messages
  • All new or changed code is covered with unit/integration tests (if applicable)
  • README.md files for modified modules are updated and included in the pull request if any README.md predefined sections require an update
  • All automated tests passed successfully (all builds are green)

@m2-assistant
Copy link

m2-assistant bot commented Jan 27, 2026

Hi @AlexRapatij. Thank you for your contribution!
Here are some useful tips on how you can test your changes using Magento test environment.
❗ Automated tests can be triggered manually with an appropriate comment:

  • @magento run all tests - run or re-run all required tests against the PR changes
  • @magento run <test-build(s)> - run or re-run specific test build(s)
    For example: @magento run Unit Tests

<test-build(s)> is a comma-separated list of build names.

Allowed build names are:
  1. Database Compare
  2. Functional Tests CE
  3. Functional Tests EE
  4. Functional Tests B2B
  5. Integration Tests
  6. Magento Health Index
  7. Sample Data Tests CE
  8. Sample Data Tests EE
  9. Sample Data Tests B2B
  10. Static Tests
  11. Unit Tests
  12. WebAPI Tests
  13. Semantic Version Checker

You can find more information about the builds here
ℹ️ Run only required test builds during development. Run all test builds before sending your pull request for review.


For more details, review the Code Contributions documentation.
Join Magento Community Engineering Slack and ask your questions in #github channel.

@AlexRapatij
Copy link
Author

@magento run all tests

@AlexRapatij AlexRapatij changed the title Patch 2 Fix the analytics_collect_data cron job long run Jan 27, 2026
@AlexRapatij AlexRapatij reopened this Jan 27, 2026
@AlexRapatij AlexRapatij changed the title Fix the analytics_collect_data cron job long run Performance fix the analytics_collect_data cron job long run Jan 27, 2026
@TuVanDev
Copy link
Member

TuVanDev commented Jan 27, 2026

@AlexRapatij Thank you for your contribution. There is a test related to this limit value: the testValidate function in the ReportValidatorTest class (Test/Unit/ReportXml/DB/ReportValidatorTest.php). Could you modify that test as well?

$this->selectMock->expects($this->once())->method('limit')->with(0);

@AlexRapatij
Copy link
Author

@magento run all tests

@convenient
Copy link
Contributor

@magento run Unit Tests, Functional Tests EE, Functional Tests CE, Functional Tests B2B, Database Compare

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants